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Limitations of Non Model-Based Recognition 

Schemes 


1 Introduction 

Recognizing objects in images is one of the most important aspects of visual perception. 
From the computational point of view, object recognition is also one of the most difficult 
problems. The difficulties are due to the fact that different images of the same object 
may be very dissimilar. The image of an object depends not only on the object’s shape, 
but also on the viewing position, the illumination conditions, other objects in the scene, 
and noise. 

Several approaches have been proposed to deal with this problem of the dissimilarity 
between different views of the same object. In general, it is possible to classify these 
approaches into model-based vs. non model-based schemes. In this paper we examine the 
limitations of non model-based recognition schemes. 

Let us begin with some basic definitions. A recognition function is a function from 
2-D images to a space with an equivalence relation. Without loss of generality we can 
assume that the range of the function is the real numbers, R. We define a consistent 
recognition function for a set of objects to be a recognition function that has identical 
value on all images of the same object from the set. That is, let s be the set of objects 
that / has to recognize. If tq and v 2 are two images of the same object from the set s 
then /(uj) = f(v 2 ). 

A recognition scheme is a general scheme for constructing recognition functions for 
particular sets of objects. It can be regarded as a function from sets of 3-D objects, to the 
space of recognition functions. That is, given a set of objects, s, the recognition scheme, 
g, produces a recognition function, g(s) = /. The scope of the recognition scheme is the 
set of all the objects that the scheme may be required to recognize. In general, it may be 
the set of all possible 3-D objects. In other cases, the scope may be limited, e.g., to 2-D 
objects, or to faces, or to the set of symmetric objects. A set s of objects is then selected 
from the scope and presented to the recognition scheme. The scheme g then returns 
a recognition function / for the set s. A recognition scheme is considered consistent if 
g(s) = / is consistent on s as defined above, for every set s from the scheme’s scope. 

A model-based scheme produces a recognition function g(s) = f that depends on the 
set of models. That is, there exist two sets si and $ 2 such that <7(si) ^ g{s 2 ) where the 
inequality is a function inequality. Note that the definition of model-based scheme in our 
discussion is quite broad, it does not specify the type of models or how they are used. 
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The nonlinear RBF interpolation scheme (Poggio &; Edelman 1990) is an example of 
a model-based recognition scheme. In this scheme an image is represented as a point 
in some high dimensional space. A model is constructed from a set of images of a 
given object. The model defines a nonlinear subspace which is an interpolation of the 
model images in the high dimensional space. Given a new image, the recognition scheme 
computes whether the image belongs to one of the known subspaces. In this scheme, 
the subspaces that the function computes depend on the set of objects that have to be 
recognized and the images that have been presented, therefore the function value for a 
given image depends on the set of objects learned by the scheme. The schemes developed 
by Brooks (1981), Bolles k Cain (1982), Grimson k Lozano-Perez (1984),Grimson k 
Lozano-Perez (1987), Lowe (1985), Huttenlocher k Ullman (1987) and Ullman (1989) 
are also examples of model-based recognition schemes. 

A non model-based recognition scheme produces a recognition function g(s ) = / that 
does not depend on the set of models. That is, if g is a non model-based recognition 
scheme, then for every two sets si and s 2 , g(si) = g(s 2 ), where the equality is a function 
equality. 

Non model-based approaches have been used, for example, for face recognition. In this 
case the scope of the recognition scheme is limited to faces. These schemes use certain 
relations between facial features to uniquely determine the identity of a face (Kanade 
1977, Cannon et al. 1986, Wong et al. 1989). In these schemes, the relations between the 
facial features used for the recognition do not change when a new face is learned by the 
system. Other examples are schemes for recognizing planar curves (Lin 1987, Cyganski 
et al. 1987). 

In this paper we consider the limitations of non model-based recognition schemes. A 
consistent non model-based recognition scheme produces the same function for every set 
of models. Therefore, the recognition function must be consistent on every possible set 
of objects within the schem’s scope. Such a function is universally consistent , that is, 
consistent for objects in its scope. 

A consistent recognition function of the set s should be invariant to at least two 
types of manipulations: changes in viewing position, and changes in the illumination 
conditions. We first examine the limitation of non model-based schemes with respect to 
viewing position, and then to illumination conditions. 

In examining the effects of viewing position, we will consider objects consisting of a 
discrete set of 3-D points. The domain of the recognition function consists of all binary 
images resulting from scaling of orthographic projection of such discrete objects on the 
plane. We show (Section 2) that every consistent universal recognition function with 
respect to viewing position must be trivial, i.e. a constant function 1 . Such a function 

X A similar result has been independently proved by Burns et al. 1990 and Clemens &: Jacobs 1990. 
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does not make any distinctions between objects, and therefore cannot be used for object 
recognition. On the other hand, we show (Section 6) that in a model-based scheme it is 
usually possible to define a nontrivial consistent recognition function. 

The human visual system, in some cases, misidentifies an object from certain viewing 
positions. We therefore consider recognition functions that are not perfectly consistent. 
Such a recognition function can be inconsistent for some images of objects taken from 
specific viewing positions. In Section 3.1 we show that such a function must still be 
constant, even if it is inconsistent for a large number of images (we define later what we 
consider “large”). We also consider (Section 3.2) imperfect recognition functions where 
the values of the function on images of a given object may vary, but must lie within a 
certain interval. 

Many recognition schemes deal with a limited scope of objects such as cars, faces or 
industrial parts. In this case, the scheme must recognize only objects from a specific 
class (possibly infinite) of objects. For such schemes, the question arises of whether there 
exists a non-trivial consistent function for objects from the scheme’s scope. The function 
can have in this case arbitrary values for images of objects that do not belong to the class. 
The existence of a nontrivial consistent function for a specific class of objects depends 
on the particular class in question. In Section (4) we discuss the existence of consistent 
recognition function with respect to viewing position for specific classes of objects. In 
Section (4.1) we give an example of a class of objects for which every consistent function 
is still a constant function. In Section (4.2) we discuss infinite classes of objects for which 
a nontrivial consistent function does exist. In Section (4.2) we also define the notion of 
the function discrimination power. The function discrimination power determines the 
set of objects that can be discriminated by a recognition scheme. We show that, given a 
class of objects, it is possible to determine an upper bound for the discrimination power 
of any consistent function for that class. We use as an example the class of symmetric 
objects (Section 4.2.2). 

Finally, we consider grey level images of objects that consist of n small surface patches 
in space (this can be thought of as sampling an object at n different points). We show that 
every consistent function with respect to illumination conditions and viewing position 
defined on points of the grey level image is also a constant function. 

We conclude that every consistent recognition scheme for 3-D objects must depend 
strongly on the set of objects learned by the system. That is, a general consistent 
recognition scheme (a scheme that is not limited to a specific class of objects) must 
be model-based. In particular, the invariant approach cannot be applied to arbitrary 
3-D objects viewed from arbitrary viewing positions. However, a consistent recognition 
function can be defined for non model-based schemes restricted to specific class of objects. 
An upper bound for the discrimination power of any consistent recognition function can 
be determined for every class of objects. 


3 



It is worth noting here the differences between the existence of features that are 
invariant to viewing position or illumination condition, and the existence of consistent 
recognition functions. An example of invariant features are parallel lines that are in¬ 
variant to viewing position (under orthographic projection). Other examples of features 
invariant to viewing position can be found in Verri & Yuille (1986) and Ponce et al. (1987). 
However, a function that merely detects invariant features (without recognizing different 
objects) can be regarded as a consistent recognition function that must recognize only a 
given set of objects, and the general results are not applicable to this case (see Section 6). 

Many applications of functions that are invariant to viewing position for images of 
2-D objects can be found in the literature, such as, Fourier transform invariances (Lin 
1987) or moment invariances (Hu 1961, Hu 1962, and Khotanzad & Hong 1990). Our 
analysis applies to the general case of 3-D objects and hence does not contradict these 
results. The analysis of the class of 2-D objects (Lamdan & Wolfson) shows that for 2-D 
(pointwise) objects a non-trivial invariant function exists. Therefore, a non model-based 
recognition scheme with limited scope of 2-D point objects can be defined. 


2 Consistent recognition function with respect to 
viewing position 

We begin with the general case of a universally consistent recognition function with 
respect to viewing position, i.e. a function invariant to viewing position of all possible 
objects. The function is assumed to be defined on the orthographic projection of objects 
that consist of points in space. 

Claim 1: Every function that is invariant to viewing position of all possible objects is 
a constant function. 

Proof: A function that is invariant to viewing position by definition yields the same 

value for all images of a given object. Clearly, if two objects have a common orthographic 
projection, then the function must have the same value for all images of these two objects. 

We define a reachable sequence to be a sequence of objects such that each two succes¬ 
sive objects in the sequence have a common orthographic projection. The function must 
have the same value for all images of objects in a reachable sequence. A reachable object 
from a given object is defined to be an object such that there exists a reachable sequence 
starting at the given object and ending at the reachable object. Clearly, the value of the 
function is identical for all images of objects that are reachable from a single object. 
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(a) 



Figure 1: Two perspective views of the same three-dimensional object, (a) The object 
viewed from 0 deg. (b) the object viewed from 5 deg about its vertical axis. 


Every image is an orthographic projection of some 3-D object. In order to prove that 
the function is constant on all possible images, all that is left to show is that every two 
objects are reachable from one another. This is shown in Appendix 1. □ 

An example that demonstrates how a box-like object is reachable from a pyramid 
was given by Ullman (1977) (see Fig 1). A wire object is shown whose projection in one 
direction is a triangle, and in another direction (see the shadows in Fig l), only 5 degrees 
apart, is a square. Hence, any invariant function must have the same value for a box and 
a pyramid. 

We have shown that any universal and consistent recognition function is a constant 
function. Any non model-based recognition scheme with a universal scope is subject to 
the same limitation, since such a scheme is required to be consistent on all the objects in 
its scope. Hence, any non model-based recognition scheme with a universal scope cannot 
discriminate between any two objects. 


3 Imperfect recognition functions 

Up to now, we have assumed that the recognition function must be entirely consistent. 
That is, it must have exactly the same value for all possible images of the same objects. 
However, a recognition scheme may be allowed to make errors. We turn next to examine 
recognition functions that are less than perfect. In Section ?? we consider consistent 
functions with respect to viewing position that can have errors on a significant subset of 
images. In Section ?? we discuss functions that are almost consistent with respect to 
viewing position, in the sense that the function values for images of the same object are 
not necessarily identical, but only lie within a certain range of values. 
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Figure 2: A cube perceived as a 2-D hexagonal when viewed from a certain viewing 
position. 


3.1 Errors of the recognition function 

The human visual system may fail in some cases to identify correctly a given object 
when viewed from certain viewing positions. For example, it might identify a cube from 
a certain viewing angle as a 2-D hexagon (Fig 2). The recognition function used by 
the human visual system is inconsistent for some images of the cube. The question is 
whether there exists a nontrivial universally consistent function, when the requirements 
are relaxed: for each object the recognition function is allowed to make errors (some 
arbitrary values that are dilferent from the unique value common to all the other views) 
on a subset of views. The set should not be large, otherwise the recognition process will 
fail too often. 

Given a function /, for every object x let Ef(x ) denote the set of viewing directions 
for which / is incorrect (Ef(x) is defined on the unit sphere). The object x again is taken 
to be a point in R n . We also assume that objects that are very similar to each other 
have similar sets of “bad” viewing directions. For example, if the cube in Fig 2 is slightly 
distorted, the “bad” singular view will be only slightly shifted. More specifically, let us 
define for each object x, the value $(x, e) to be the measure (on the unit sphere) of all 
the viewing directions for which / is incorrect on at least one object in the neighborhood 
of radius e around *. That is, $(* 0 > e ) is the measure of the set UxeB(* 0 ,e) Ef( x )- We can 
now show that even if 4>(®,e) is rather substantial (i.e. / makes errors on a significant 
number of views), / is still the trivial (constant) function. Specifically, assuming that 
for every x there exist an e such that $(x,e) < D (where D is about 14% of the possible 
viewing directions), then / is a constant function. The proof of this claim is given in 
Appendix 2. 
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3.2 “Almost consistent” recognition functions 


In practice, a recognition function may also not be entirely consistent in the sense that 
the function values for different images of the same object may not be identical, but only 
close to one another in some metric space (e.g., within an interval in R). In this case, 
a threshold function is usually used to determine whether the value indicates a given 
object. 

Let an object neighborhood be the range to which a given object is mapped by such an 
“almost consistent” function. Clearly, if the neighborhood of an object does not intersect 
the neighborhoods of other objects, then the function can be extended to be a consistent 
function by a simple composition of the threshold function with the almost consistent 
function. In this case, the result of the general case (Claim 1) still holds, and the function 
must be the trivial function. 

If the neighborhoods of two objects, a and 6, intersect, then the scheme cannot dis¬ 
criminate between these two objects on the basis of images that are mapped to the 
intersection. In this case the images mapped to the intersection constitute a set of im¬ 
ages for which / is inconsistent. If the assumption from the previous section holds, then 
/ must be again the trivial function. 

We have shown that an imperfect universal recognition function is still a constant 
function. It follows that any non model-based recognition scheme with a universal scope 
cannot discriminate between objects, even if it is allowed to make errors on a significant 
number of images. 


4 Consistent recognition functions 
for class of objects 

So far we have assumed that the scope of the recognition scheme was universal. That 
is, the recognition scheme could get as its input any set of (pointwise) 3-D objects. 
The recognition functions under consideration were therefore universally consistent with 
respect to viewing position. Clearly, this is a strong requirement. In the following sections 
we consider recognition schemes that are specific to classes of objects. The recognition 
function, in this case must still be consistent with respect to viewing position, but only 
for objects that belong to the class in question. That is, the function must be invariant 
to viewing position for images of objects that belong to a given class of objects, but can 
have arbitrary values for images of objects that do not belong to this class. 

The possible existence of a nontrivial consistent recognition function for an object 
class depends on the particular class in question. In Section (4.1) we consider a simple 
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class for which a nontrivial consistent function (with respect to viewing position) still 
does not exist. In Section (4.2) we discuss the existence of consistent functions for certain 
infinite classes of objects. We show that when a nontrivial consistent function exist, the 
upper bound of any function discrimination power can be determined. Finally, we use 
the class of symmetric objects (Section 4.2.2) in order to demonstrate the existence of 
consistent function for an infinite class of objects and its discrimination power. 

4.1 The class of a prototypical object 

In this section, we consider the class of objects that are defined by a generic object. 
The class is defined to consist of all the objects that are sufficiently close to a given 
prototypical object. For example, it is reasonable to assume that all faces are within a 
certain distance from some prototypical face. For objects composed of n points in space, 
such a class can be thought of as a sphere in R 3n around the prototypical object. 

The results established for the unrestricted case hold for such classes of objects. That 
is, every consistent recognition function with respect to viewing position of all the objects 
that belong to a class of a given prototypical object is a constant function. The proof for 
this case is similar to the proof of the general case in Claim 1. 


4.2 A consistent recognition function 

Nontrivial consistent recognition functions can be defined for many infinite classes of 
objects. For example, consider the infinite class of the eight-point objects with the 
points lying on the corners of some rectangular prism, together with the class of all 
three-point objects. Clearly, a nontrivial consistent function for this class can be defined. 
However, using the same proof as in Claim 1, it can be shown that such a function 
will have only two possible values, one for the three-point objects and the other for the 
eight-point objects. Hence, the function can be used for classification of three and eight- 
point objects, but cannot be used for identification of these objects. In this example 
the function is consistent for the class, all the views of a given object will be mapped to 
the same value. However, the function has a limited discrimination power, it can only 
distinguish between two subclasses of objects. In the next section we examine further 
the discrimination power of a recognition function. 

4.2.1 Upper bound for function discrimination power 

Given a class of objects, we first define a reachability partition of equivalence subclasses. 
Two objects are within the same equivalence subclass if and only if they are reachable 
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from each other. Reachability is clearly an equivalence relation and therefore it divides 
the class into equivalence subclasses. Every function / induces a partition into equivalent 
subclasses of its domain. That is, two objects, a and 6, belong to the same equivalent 
subclass if and only if f(a) = f(b). Every consistent recognition function must have the 
same value for all objects in the same equivalence subclass defined by the reachability 
partition (the proof is the same as in Claim 1). However, the function can have different 
values for images of objects from different subclasses. Therefore, reachability partition 
is a refinement of any partition induced by a consistent recognition function. That is, 
every consistent recognition function cannot discriminate between objects within the 
same reachability partition subclass. 

The reachability subclasses in a given class of objects determines the upper bound 
on the discrimination power of any consistent recognition function for that class. If the 
number of reachability subclasses in a given class is finite, then it is the upper bound for 
the number of values in the range of any consistent recognition function for this class. 
In particular, it is the upper bound for the number of objects that can be discriminated 
by any consistent recognition function for this class. Note that the notion of reachability 
and, consequently, the number of equivalence classes, is independent of the particular 
recognition function. If the function discrimination power is low, the function is not very 
helpful for recognition but can be used for classification, the classification being into the 
equivalence subclasses. 

In a non model-based recognition scheme, a consistent function must assign the same 
value to every two objects that are reachable within the scope of the scheme. In contrast, 
a recognition function in a model-based scheme is required to assign the same value to 
every two objects that are reachable within the set of objects that the function must 
in fact recognize. Two objects can be unreachable within a given set of objects but be 
reachable within the scope of objects. A recognition function can therefore discriminate 
between two such objects in a model-based scheme, but not in a non model-based scheme. 


4.2.2 The class of symmetric objects 

The class of symmetric objects is a natural class to examine. For example, schemes for 
identifying faces, cars, tables, etc, all deals with symmetric (or approximately symmetric) 
objects. Every recognition scheme for identifying objects belonging to one of these classes, 
should be consistent only for symmetric objects. 

In the section below we examine the class of bilaterally symmetric objects. We will 
determine the reachability subclasses of this class, and derive explicitly a recognition 
function with the optimal discrimination power. We consider images such that for every 
point in the image, its symmetric point appears in the image. 
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Without loss of generality, let a symmetric object be (0,pi,p 2 > ■■•i? 2 n)t where Pi = 
( Xi,yi,Zi ) and p n+i = ( -Xi,yi,Zi ) for 1 < i < n. That is, Pi and p n+i are a pair of 
symmetric points about the y — z plane for 1 < i < n. Let p\ = (&[,yj, z\) be the new 
coordinates of a point pi following a rotation by a rotation matrix R and scaling by a 
scaling factor s. The new *-coordinates are: 

x i = s (xiTu + Vir 12 + Zir 13 ) 

X n+i = S (~ X i T 11 + Vi r 12 + W 13) 

In particular, for every two symmetric points Pi and p n +i, x\ — x T n+i = 2 sa: i 7 ‘ 11 . For 
every i the following ratios hold: 


x i ~ = ®i 

x i ~ <+i *1 

In the same manner it can be shown that for every i the following ratios hold:: 

y\ - y r n +i = 2 /* 
yl - y r n +1 yi 

It follows that the ratios between the distances of two pairs of symmetric points do 
not change when the object is rotated in space and scaled. 

We will show that these ratios define a nontrivial partition of the class of symmetric 
objects to equivalence subclasses of unreachable objects. Let d(pi,pj) be the distance 
between the points Pi and Pj. Define the function h by 

\ _ / d(P2 ; Pn+ 2 ) d(p 3 ,p n+3 ) d(p n ,p 2n ) 

,Pl,P2,-,P2n ^ d (p u p n+i y d ( pi ,p n+i y’ d(pi,p n+1 ) 

Claim 2 : Every two symmetric objects a and b are reachable if and only if h(a ) = h(b). 

Proof: 

Let h(a ) = h(b). We have to show that a and b are reachable by a symmetric sequence. 
That is, there exists a sequence of symmetric objects starting at a and ending at b such 
that every two successive objects have an orthographic projection in common. This is 
proved in Appendix 3. 

Let h(a) 7 ^ h(b). We have to show that a and b are not reachable by a sequence of 
symmetric objects. Assume that there is a sequence of symmetric objects starting at 
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a and ending at b such that every two successive objects have a common orthographic 
projection. For every two successive objects, a, and a; +1 , h(a;) = h(a i+1 ) because a; and 
a i+1 have a common orthographic projection and h is independent of the viewing position. 
It follows that for every two objects, a, and aj, in the sequence connecting the objects a 
and b , h(a,i) = h(a,j). This contradicts the assumption that h(«i) = h(a ) ^ h(b ) = h(a n ). 
□ 


It follows from this Claim that a consistent recognition function with respect to 
viewing position defined for all symmetric objects, can only discriminate between objects 
that differ in the relative distance of symmetric points. 


5 Consistent recognition function for 
grey level images 

So far, we have considered only binary images. In this section we consider grey level 
images of Lambertian objects that consist of n small surface patches in space (this can 
be thought of as sampling an object at n different points). Each point p has a surface 
normal N p and a reflectance value p p associate with it. The image of a given object 
now depends on the points’ location, the points’ normals and reflectance, and also on the 
illumination condition, that is, the level of illumination, and the position and distribution 
of the light sources. 

An image now contains more information than before: in addition to the location 
of the n points, we now have the grey level of the points. The question we consider 
is whether under these conditions objects may become more discriminable then before 
by a consistent recognition function. We now have to consider consistent recognition 
functions with respect to both illumination condition and viewing position. We show 
that a nontrivial universally consistent recognition function with respect to illumination 
condition and viewing position still does not exists. 

Claim 3: Any universally consistent function with respect to illumination condition 
and viewing position, that is defined on grey level images of objects consisting of n surface 
patches, is the trivial function. 

In order to prove this claim, we will show that every two objects are reachable. That 
is, there exists a sequence of objects starting with the first and ending with the second 
object, and every successive pair in the sequence has a common image. A pair of objects 
has a common image if there is an illumination condition and viewing position such that 
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the two images (the points’ location as well as their grey level) are identical. The proof 
is given in Appendix 4. 

We conclude that the limitation on consistent recognition functions with respect to 
viewing position do not change when the grey level values are also given at the image 
points. In particular, it follows that a consistent recognition scheme that must recognize 
objects regardless of the illumination condition and viewing position must be model- 
based. 


6 Model-based recognition schemes 

In this section we show a model-based recognition scheme that is both consistent and 
nontrivial. Clearly, if every two objects in the set are reachable, then a nontrivial consis¬ 
tent recognition function does not exist. We therefore consider sets of objects in which at 
least some of the objects are not reachable by a sequence of objects from the set. For such 
a set of objects, it is possible to define a recognition function that ( i ) will be consistent, 
( ii ) will have different values for objects that do not share a view. The definition of the 
function, in this case, depends strongly on the 3-D models of these objects. In order to 
construct the function, we use as an example the linear combination approach (Ullman 
& Basri 1989). 

An object model in the linear combination scheme contains a number of images of a 
given object together with the correspondence between the image points. Every image of 
the object in question, taken from an arbitrary viewing position, can be expressed as the 
linear combination of the location of the corresponding points in the model images. Given 
an image and a candidate model, the first stage of the linear combination scheme consists 
of computing the coefficients of the linear combination. The linear combination of the 
model images is then computed, and the result (the transformed model) is compared 
with the image. If the two agree, the image is identified as an instance of the model. In 
the case of a finite set of objects, this computation can be repeated for every model in 
the set. The value of the function can be, for example, a canonical name for the object. 
Clearly, the function has the same value for all images of the same object and different 
values for images of different objects that do not share an orthographic projection. 

Alternative schemes may also be used for the same task. The main point of the 
example is that a consistent recognition function that is as discriminating as possible, 
can be tailored to a given set of objects. 
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7 Conclusion 


In this paper we have established some limitations on non model-based recognition 
schemes. In particular, we have established the following claims: 

• Every function that is invariant to viewing position of all possible point objects is 
a constant function. It follows that every consistent recognition scheme must be 
model-based. 

• If the recognition function is allowed to make mistakes and misidentify each object 
from a substantial fraction of viewing directions (about 14%) it is still a constant 
function. 

We have considered recognition schemes restricted to classes of objects and showed 
the following: For some classes (such as classes defined by prototypical object) the only 
consistent recognition function is the trivial function. For other classes (such as the 
class of symmetric objects), a nontrivial recognition scheme exists. We have defined 
the notion of the discrimination power of a consistent recognition function for a class of 
objects. We have shown that it is possible to determine the upper bound of the function 
discrimination power for every consistent recognition function for a given class of object. 
The bound is determined by the number of equivalence subclasses (determined by the 
reachability relation). For the class of symmetric objects, these subclasses were derived 
explicitly. 

For grey level images, we have established that the only consistent recognition function 
with respect to viewing position and illumination conditions is the trivial function. 

In this study we considered only objects that consist of points on surface patches in 
space. Real objects are more complex. However, many recognition schemes proceed by 
first finding special contours or points in the image, and then applying the recognition 
process to them. The points found by the first stage are usually projections of stable 
object points. When this is the case, our results apply to these schemes directly. For 
consistent recognition functions that are defined on contours or surfaces, our result do 
not apply directly, unless the function is applied to contours or surfaces as sets of points. 
In the future we plan to extend the result to contours and surfaces. 


Appendix 1 

In this Appendix we prove that in the general case every two objects are reachable from 
one another. 
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First note that the projection of two points, when viewed from the direction of the 
vector that connects the two points, is a single point. It follows that for every object with 
n — 1 points there is an object with n points such that the two objects have a common 
orthographic projection. Hence, it is sufficient to prove the following claim: 

Claim 4: Any two objects that consists of the same number of points in space are 
reachable from one another. 

Proof: Consider two arbitrary rigid objects, a and b , with n points. We have to show 

that b is reachable from a. That is, there exists a sequence of objects such that every 
two successive objects have a common orthographic projection. 

Let the first object in the sequence be a. Each object in the sequence consists of the 
same points as the previous object, except for one point of object a which is replaced by 
a new point of object b. For example, the second object in the sequence consists of n — 1 
points of object a and one point of object b. 

The formal definition of the sequence can be written as follows. Let the object a be 
a “ (P 11 P 21 "->Pn) an< l the object b be b = {p\,p\, ■■■,p h n) where p? and p\ are points in 
3-D. The first and last objects in the sequence are a and 6, i.e., a-i = a and a n+1 = b. We 
take the rest of the sequence, a 2 ,...,a„ to be the objects: a; = {p\,p\, •••iP b i-iiPii •••>?«)• 
For example a 2 = {p\,p 2 -, ...,<). 

The first and the last objects in the sequence are a and 6, respectively. All that is left 
to show is that for every two successive objects in the sequence there exists a direction 
such that the two objects project to the same image. By the sequence construction, every 
two successive objects differ by only one point. The two non-identical points project to 
the same image point on the plane perpendicular to the vector that connects them. 
Clearly, all the identical points project to the same image independent of the projection 
direction. Therefore, the direction in which the two objects project to the same image is 
the vector defined by the two non-identical points of the successive objects. □ 


Appendix 2 

In this appendix we show that even an imperfect recognition function is a constant 
function, provided that the sets of viewing directions on which it fails are not too large. 

An object x is taken to be a point in R n . For each object x we assume that / gets a 
unique value, considered the “correct” value for x, for most of the views, but it is allowed 
to have different, incorrect values, for other views. $/(x, e x ) is the measure of the set of 
viewing directions for which / is incorrect on at least one object in a neighborhood of 
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radius e x around x (the units of ^(a,^) are Steradians 2 ). We wish to show that if for 
every x there exists an e x such that $/(&, e x ) < D for a certain constant D, then / must 
be the trivial (constant) function. We give a lower bound on D which is about 14% of 
the set of all viewing directions. 

Let us define a pair of objects, (a, 6), to be an f-correct pair if a and b have a common 
orthographic projection and the value of / on this common orthographic projection is 
correct for both objects. An f-correct sequence is a sequence such that each successive 
pair of objects is an /-correct pair. We say that objects a and b are / -reachable if there 
exists an /-sequence starting at a and ending at b. 

Using a similar proof to the general case (Claim 1), it is sufficient to prove the following 
Lemma. 

Lemma: Let / be a recognition function defined on objects that consist of n points in 
space. Let $/(a,e x ) be the measure of viewing directions for which / is incorrect on at 
least one of the objects in the e x neighborhood of x. Assume that for every x there exists 
an e x such that $/(a, e x ) < D ( D is fixed for all objects and taken below 0.92 Steradians 
which is about 14% percent of all possible viewing directions). Then, every two objects 
(consisting of n points in space) are /-reachable. 

Proof: Every object that consists of n points in space can be regarded as a point in 

R 3n . We assume that, if / is correct on one image, then it will also be correct on the 
same image scaled by any factor. That is, if a and b are /-reachable, then a and b scaled 
by any factor are also /-reachable. Hence, it is sufficient to consider only objects that 
are points in the unit sphere in R 3n , which we denote by B$ n . 

Let a and b be two objects in B^. Consider the original sequence of objects connecting 
a and b as in Claim 1 (see Appendix 1). If the sequence is /-correct, then we are 
done. Otherwise, using the three claims below we will show that it is always possible 
to construct an /-correct sequence of objects connecting any /-incorrect pair of objects 
(note that successive objects in the sequence differ by a single point). The /-correct 
sequence between the objects a and b is, then, the original sequence with additional 
subsequences between all the originally /-incorrect pairs of objects. We first list the 
three claims, then give their proofs. 

Claim 5: There exists a fixed r such that for every object x E Bq u , 4>/(a,r) < D. 

In the following two claims, let (aj,a i+1 ) be an /-incorrect pair of successive objects 
from the original sequence. Let d be the distance between a; and a i+1 as measured in 
R 3n . 

2 A Steradian is the area cut out by a cone of directions on the surface of the unit sphere surface. The 
area cut out by a cone with apex angle a on the surface of a unit sphere is: 2tt(1 — cos(a)). 
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Claim 6: If d < 2 r then there exists an object Co such that the pairs (aj,Co) and 
(aj +1 ,Co) are /-correct. 

Claim 7: If d > 2r, then there exist two objects, c 0 and c x such that: 

(a) The pairs (co,a,) and (ci,o J+ i) are /-correct. 

(b) The distance between Co and c\ is less than d—p, where p is a constant strictly greater 
than zero. 

We now show that these claims suffice. Given the original sequence from Claim 1, 
replace every /-incorrect pair of objects (aj,aj +1 ) such that d < 2r by the subsequence 
(a^, c 0 , a;+i) from Claim 6. Replace every /-incorrect pair of objects (a,-,aj +1 ) such that 
d > 2r by the subsequence (a;,Co,C;i,a,- +1 ) from Claim 7. If (co,Ci) is still /-incorrect, 
repeat the process until the distance between the two new objects is less than 2r. Claim 7b 
guarantees that this process needs to be repeated only a Unite number of times. As a 
result, an /-correct sequence consisting of finite number of objects is obtained. 

We next prove claims 5 — 7 above. 

Proof of 5: Let B^ n be the close unit sphere in R 3n . For every x G Bq U there ex¬ 
ists an e x such that $/(*,€*) < D. Consider the family of open sets B 3n (x, **) for every 
x G i?o". This is an infinite cover of the unit sphere B^ n . Since the sphere B$ n is a 
compact set, there exists a finite subset, {£? 3n (a:i, ej)}£l 0 that covers B^ n . Let r be the 
minimum radius in the finite cover. 

Every point x G B$ n satisfies x G B 3n (xi , ^-) for some 0 < i < m. We thus have: 

$f(x,r) < $ f (xi,2r) < $/(**, 2e<) < $/(*<, O < D 
Hence, $f(x ,r) < D for every x G -Bq". □ 

Proof of 6: By the sequence construction, the objects and differ by only one 
point. Let a be the object that consists of the n — 1 identical points of a; and a,i +1 . Let 
Pi and pi+i be the non-identical points of a* and a; +x respectively. We define the object 
a ©5 to be the object that consists of the points of both a and b. For example, a; = a@pi 
and <Xj+i = a 0 Pi+i- Let us define the function /' on images of one point objects in the 
following way: f'(p) = f(a 0 p). 

Every object that consists of one point in space can be regarded as a point in R 3 . 
Therefore, in order to consider the objects of the form a 0 p, it is sufficient to consider 
the unit sphere in R 3 and the function /'. 

Let p be the point -' + ^ ,+1 . The distance between Pi and is less than 2 r. Therefore, 
Pi,Pi+ 1 G B(p,r ) (a ball of radius r centered at p”). Consider the plane v of equidistant 
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Figure 3: The two non-identical points, Pi and Pi+i and the portions of the spheres 
surfaces, S Pi and Spi+i ■ 

points from Pi and Pi +1 in the sphere B(p,r). We claim that there exists a point p c on 
v such that the pairs (pi,p c ) and (p»+i,Pc) are /' correct. Let us assume that such a 
point does not exist. That is, for every point p c £ v one of the pairs (p c ,P;) or (p c , Pi+i) 
is /-incorrect. The function / must, therefore, be incorrect on at least D directions of 
viewing position for one of the objects in v U {p;,P;+i}. It follows that $p(p, r) > D. 
From the definition of $ and /' it follows that for every p € Bq, $ p(p, r) < d>/(a © p, r). 
We can assume that without loss of generality 3 the object a P — a ©p is in j B^ n . If this is 
the case it can be shown that a p is in B$ n . By Claim 1 we have that $/(a ©p, r) < D 
and hence also $(p,r) < D. Hence we have a contradiction. □ 

Proof of 7: Let pi and Pi +1 be the non-identical points of a{ and a; +1 . Consider the two 
spheres Bi = B(pi,r ) and i/ +1 = B(p t+1 ,r). Let S Pi be the portion of the sphere surface 
B{Pii r ) cut by a cone whose apex is p^ whose axis is the vector p, — p t+ 1 , and apex angle 
35° (see Fig 3). S Pi+1 is defined in a similar manner. 

There must be at least one point, p Co on S Pi such that the pair (pi,p Co ) is /'-correct. 
Otherwise, $p(pi,r) > D in contradiction with the fact that $/(a©pj,r) = <l>/(a;,7-) < D 
(Claim 5). For the same reason, there exists a point p Cl on Sp, +1 such that the pair 
(Pi+i,Pci) is /' correct. All that is left to be shown is that the distance, s, between p co 
and p ci is smaller than d — p. 

Let s m be the maximal distance between two points on S Pi and S Pi+1 for a given d. It 
is sufficient to show that d — s m > p (note that s is a function of d). Let p co and 

p Cl £ S Pi+1 be two points such that the distance between them is maximal for a given d. 
From symmetry considerations the line connecting the points p Co and p Cl intersects the 

3 by scaling the two given objects such that the distance between each of them and the unit sphere 
surface will be at least r 
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line that connects the points Pi and Pi+i- Denote by o the intersection point. Denote 
by s 0 a nd s x the distances between p co and o, and p ci and o respectively, denote by d 0 
and d\ the distances between Pi and o and pj +1 and o respectively. Using the cosine rule, 
s 0 = \Jd%- 2d 0 r cos(a) + r 2 , where 0 < a < 35. Clearly, s 0 has maximum when a = 35. 

s 0 is monotonically increasing in d (it can be readily proved by taking the derivative 
^•). Therefore it is sufficient to compute s P , the value of s 0 when d = r. In that case, 

s r = r^/2 — 2 cos(35). From symmetry considerations, So = Si and do = d\. Therefore we 

obtain, d— s > d — s m = 2 (d 0 — s 0 ) > 2 (d — s r ) > 2 (r — s r ) > 2r(l — yj 2 — 2 cos(35)) = p. 


Appendix 3 


In this appendix we prove that every two symmetric objects, a and b such that 
h(a ) = h(b), are reachable by a sequence of symmetric objects. 

Let a = (0,p“,pf, •••>? 2 n) an( i ^ = (O 5 P 15 P 25 •••»P 2 n)- Let the first object in the sequence 

d(p^jp b ) 

be a. The second object in the sequence is the object a scaled by Denote the 

second object in the sequence by a'. By our assumption h(a ) = h(b), that is, ^ p I p i' t ~'| = 

d(p*?,p b •) . . 

,; p y) • The projection of a' and b on the xxy plane are symmetric images that satisfy, 
xf = x\ for every i. 


Each object in the sequence consist of the same points as its previous object except for 
a pair of symmetric points of the object a' which is replaced by a new pair of symmetric 
points of the object 6. The direction for which the two objects project to the same image 
is the vector that connects one point from the pairs of symmetric point from a' with one 
point of symmetric point from b. Note that this vector is in the y — z plane, hence the 
symmetry of the image is kept. 


Appendix 4 

In this appendix we show that every two objects a and b composed of n small surface 
patches are reachable by a sequence of objects starting with a and ending with b such 
that for each successive pair of objects there exists a common image. It will then follow 
that a nontrivial consistent recognition function with respect to illumination condition 
and viewing position does not exist. 

The grey level at each image point is determined by the illumination condition, the 
normal direction and reflectance value of the object point. Given two n-point objects, 


18 



a and b, we construct the same sequence as in Claim 1, but the normal direction and 
reflectance value of each new b point (that replaces an a point) is taken to have the 
normal direction and reflectance value of the corresponding point from object a. In this 
manner we get a sequence of objects starting at a and ending at some b' such that each 
pair of successive objects have a common image (the same location and grey level values 
of the image points). The objects b and b' have identical configurations of n points in 
space, but with possibly different normal directions and albedo values associated with 
corresponding points. Therefore, the object b and b' do not necessarily have a common 
image. 

It is left to show, then, that every two n-point objects having the same configuration 
in space with possibly different normal direction and albedo values, are reachable. That 
is, there exists a sequence of objects, and for every two successive objects there is an 
illumination condition, such that the projected grey level of the two objects is identical 
at every point. 

We construct a sequence such that the first and the last objects in the sequence 
are b' and b respectively, and every two successive objects differ at only one point. Let 
Pb' and pb be the two non-identical points in a successive pair. Let Ny and Nb he the 
unit vectors in the normal directions, py and pb be the albedo of the points py and pb 
respectively. We will assume first that the objects are Lambertian (for details regarding 
the images of Lambertian surfaces see Horn 1977J. In this case, the intensity at the 
points is given by Iy = py E • Ny and lb — pbE • Nb, where E is the light source vector 
(its direction in pointing at the equivalent light source, and its magnitude is proportional 
to the source intensity). For the two objects to have a common image, it is sufficient to 
find an illumination vector E such that the py and Pb will have identical grey levels, i.e. 
Pb • E • Nb = py • E ■ Ny. Such an E clearly exists, because it is defined by one linear 
equation in three variables. The vector E should also satisfy E • Ny > 0 and E • Nb > 0. 
This is clearly possible since if E • Nb < 0, then E ■ Ny < 0 as well, and we can choose 
— E for our final solution 4 

In case the object is not Lambertian but has a specular component, the intensity at 
each point becomes the sum of the Lambertian component and the specular component. 
The intensity value due to the specular component depends on the viewing position, the 
surface normal, the light source position, and some other surface specular parameters 
(Phong 1975). We have not considered this case in detail but it seems that by choosing 
the fight source and the viewing position, two successive objects in the sequence still have 
a common image. 

4 In case that Ny = Nb but py ^ pb, the solution for E is such that E ■ Ny = 0. We can than add 
one intermediate object to the sequence, with albedo pb and normal N' v ^ Ny. 


19 



References 


- Bolles, R.C. and Cain, R.A. 1982. Recognizing and locating partially visible ob¬ 
jects: The local-features-focus method. Int. J. Robotics Research, 1(3), 57-82 . 

- Brooks, R.A. 1981. Symbolic reasoning around 3-D models and 2-D images, Arti¬ 
ficial Intelligence J., 17, 285-348. 

- Burns, J. B., Weiss, R. and Riseman, E.M. 1990. View variation of point set and 
line segment features. Proc. Image Understanding Workshop, Sep., 650-659. 

- Cannon, S.R., Jones, G.W., Campbell, R. and Morgan, N.W. 1986. A computer 
vision system for identification of individuals. Proc. IECON 86 0, WL, 1, 347-351. 

- Clemens D.J. and Jacobs D.W. 1990. Model-group indexing for recognition. Proc. 
Image Understanding Workshop, Sep., 604-613. 

- Cyganski, D., Cott, T.A., Orr, J.A. and Dodson, R.J. 1987. Development, im¬ 
plementation, testing and application of an Affine transform invariant curvature 
function. Proceeding of ICCV Conf., London, 496-500. 

- Grimson, W.E.L and Lozano-Perez, T. 1984. Model-based recognition and local¬ 
ization from sparse data. Int. J. Robotics Research, 3(3), 3-35. 

- Grimson, W.E.L and Lozano-Perez, T. 1987. Localizing overlapping parts by 
searching the interpretation tree. IEEE Trans, on PAMI. 9(4), 469-482. 

- Horn B. K.P. 1977. Understanding image intensities, Artificial Intelligence J.. 8(2), 
201-231 

- Hu, M. K., 1961. Pattern recognition by moments invariants. Proc IRE 49, 1428. 

- Hu, M.K., 1962. Visual pattern recognition by moment invariants. IRE Trans. 
Inform. Theory. IT-8, 179-187. 

- Huttenlocher, D.P. and Ullman, S. 1987. Object recognition using alignment. Pro¬ 
ceeding of ICCV Conf, London, 102-111. 

- Kanade, T. 1977. Computer recognition of human faces. Birkhauser Verlag. Basel 
and Stuttgart. 

- Khotanzad, A. and Hong, Y.H. 1990. Invariant image recognition by Zernike mo¬ 
ments. IEEE trans. on PAMI, 12(5), 489-497. 


20 



Lamdan, Y. and Wolfson, H. J. 1988. Geometric hashing: A general and efficient 
model-based recognition scheme. Proceeding of ICCV Conf., Tampa, Florida , 238- 
249. 

Lin, C. 1987. New forms of shape invariants from elliptic Fourier descriptions. 
Pattern Recognition, 20(5), 535-545 

Lowe, D.G. 1985. Three dimensional object recognition from single two-dimensional 
images. Robotics research Technical Report 202, Couraant Inst, of Math. Sciences, 
N. Y. University. 

Phong, B.T. 1975. Illumination for computer generated pictures. Communication 
of the ACM, 18(6), 311-317. 

Poggio T., and Edelman S. 1990. A network that learns to recognize three dimen¬ 
sional objects. Nature, 343, 263-266. 

Ponce, J., Chelberg, D. and Mann, W. 1985. Invariant properties of the projection 
of straight homogeneous generalized cylinders. Proceeding of ICCV Conf, London, 
631-635. 

Ullman S. 1977. Transformability and object identity. Perception and Psychophysics, 
22(4), 414-415. 

Ullman S. 1989. Alignment pictorial description: an approach to object recognition. 
Cognition, 32(3), 193-254. 

Ullman S. and Basri R. 1989. Recognition by linear combinations of models AI 
MEMO No. 1152, AI MEMO No 1152, The Artificial Intelligence Lab., M.I.T. 

Verri A. and Yuille A, 1986. Perspective projection invariants. AI MEMO No. 
832, The Artificial Intelligence Lab., M.I.T. 

Wong, K.H., Law, H.H.M. and Tsang P.W.M, 1989. A system for recognizing 
human faces, Proc. ICASSP, 1638-1642. 


21 



REPORT DOCUMENTATION PAGE 


Form Approved 
OMB No. 0704-0188 


Public reoortinc burber 'or Thi, cclleoion Of information >s «t.mated to average t hour oer response, inclufli.-g the time <or reviewing instructions warding exiting data sources 
gathering and maintaining the data needed, and completing and reviewing the collection of information Send comments regarding this burden estimate or anyot; e aspe ° 
collection of information, including suggestions for reducing this burden, to Washington Headquarters Services. '° r „ "msna n < lRRt^/ash?naton d DC^0S03 2 5 

Davis Highway Suite '204 Arlington, ,A 22202-4302. and to the Office of Management and Budget. Paperwork Reduction project ( 0704 - 0188 ). Washington. DC 20503. 


1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 

May 1991_ 


4. TITLE AND SUBTITLE 

Limitations of Non Model-Based Recognition Schemes 


6. AUTHOR(S) 

Yael Moses and Shimon Ullman 


3. REPORT TYPE AND DATES COVERED 

memorandum _ 


5. FUNDING NUMBERS 

N00014-86-K-0685 

DACA76-85-C-0010 

N00014-85-K-0124 

IRI-8900267 


7. PERFORMING ORGANIZATION NAME(S) AND AOORESS(ES) 
Artificial Intelligence Laboratory 
545 Technology Square 
Cambridge, Massachusetts 02139 


8. PERFORMING ORGANIZATION 
REPORT NUMBER 

AIM 1301 


9. SPONSORING/MONITORING AGENCY NAME(S) AND AODRESS(ES) 
Office of Naval Research 
Information Systems 
Arlington, Virginia 22217 


10. SPONSORING /MONITORING 
AGENCY REPORT NUMBER 

A£> ■A W/i 2 - 



12a. DISTRIBUTION/AVAILABILITY STATEMENT 

Distribution of this document is unlimited 


12b. DISTRIBUTION CODE 


13. ABSTRACT (Maximum 200 words) 

Abstract: Different approaches to visual object recognition can be divided into 
two general classes: model-based vs. non model-based schemes. In this paper we 
establish some limitation on the class of non model-based recognition schemes. A 
non model-based scheme is based on functions invariant to viewing position and 
illumination conditions. We show that every function that is invariant to viewing 
position of all objects is the trivial (constant) function. The same result holds 
even if the recognition function is not required to be perfect, but is allowed to 
make mistakes and misidentify each object from a substantial fraction of viewing 
directions. It follows that every consistent recognition scheme for recognizing 3-D 
objects must in general be model based. 


14. SUBJECT TERMS (keywords) 

recognition invariant properties 

object recognition model-based vision 


17. SECURITY CLASSIFICATION 
OF REPORT 

UNCLASSIFIED 


18. SECURITY CLASSIFICATION 
OF THIS PAGE 

UNCLASSIFIED 


(continued on back) 


15. NUMBER OF PAGES 

21 


1$. PRICE CODE 


19. SECURITY CLASSIFICATION 
OF ABSTRACT 

UNCLASSIFIED 


20. LIMITATION OF ABSTRACT 

UNCLASSIFIED 


NSN 7540-01-280-5500 


Standard Form 298 (Rev. 2-89) 

Prescribed by ANSI Std. Z39*18 
298*102 





Block 13 continued: 


We then consider recognition schemes restricted to classes of objects and show 
that, for some classes, the only consistent recognition function is still the trivial 
function. For other classes (such as the class of symmetric objects) a nontrivial 
recognition scheme exists. We define the notion of a discrimination power of a 
consistent recognition function for a class of objects. The function’s discrimination 
power determines the set of objects that can be discriminated by the recognition 
function. We show that it is possible to determine the upper bound of the function’s 
discrimination power for every consistent recognition function. 



CS-TR Scanning Project 
Document Control Form 


Date : 1 08 I 0 !* 


Report # /30 > _ 

Each of the following should be identified by a checkmark: 
Originating Department: 

M Artificial Intellegence Laboratory (Al) 

□ Laboratory for Computer Science (LCS) 

Document Type: 

□ Technical Report (TR) Technical Memo (TM) 

□ Other:__ 


Document Information 


Number of pages: 


Not to include DOD forms, printer instructions, etc... original pages only. 
Single-sided or □ Double-sided 


Print type: 

|~| Typewriter Q Offset Press & Laser Print 

□ InkJet Printer □ Unknown Q Other: 


Check each if included with document: 

X DOD FormifPt-'On Funding Agent Form □ Cover Page 

□ Spine □ Printers Notes D Photo negatives 

□ Other:__ 

Page Data: 


Blank PageS(bypase number)._____ 

^hoto graphsjfronal Material (by page number). 


Other (note descnption/page number). 


Description : 


Page Number: 


Scanning Agent Signoff: ~ 

Date Received: o^ / og / S 1 /- Date Scanned: / / Date Returned: OjJldJjj: 


Scanning Agent Signature:, 



